CLAIMS 

What is claimed is: 

1. A method for web content filtering, between web transmission contacting 
nodes, executing the following steps to determine bypassing web information: 

(IA) building web page filtering decision criteria, at least including keyword 
category, a relevant probability chart for every keyword, a blocking threshold, 
and a bypassing threshold, and a score deviation (SD); 

(I B) getting web page from the web server; 
(tC) looking for the next keyword; 

(I D) determining whether the current word is a keyword, if yes, further proceed 
to the next step; if not, going to step (1 H) and continue to check the information 
document; 

(IE) re-computing score deviation between the highest score and the second 
higher score from each category based on the relevant probability chart; 

(I F) determining whether the score deviation exceeds the blocking threshold, if 
yes, label the web page as a forbidden one; if not, proceed to the next step; 

(IG) determining whether the score deviation is lower than the bypassing 
threshold, if yes, label the web page as a bypassing one; if not, proceed to the 
next step; and 

(IH) reading next word from the web page and determining whether the end 
has been reached, if yes, label the web page as a bypassing one; if not, return 
to step (1D). 

2. The method for web content filtering according to claim 1, wherein said 
bypassing threshold is a function depending on the times of keyword matching, 
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the web page filtering decision standard further includes a text classified 
category C={c 1 , c 2 ,..., Cjci } that users want to forbid against, the relevance 
probability chart is built based on the text classification and the score deviation 
is achieved by the following steps: 

(2A) initializing score category S corresponding to every text category; 

(2B) computing the score in each category based on the relevance 
probability chart: 

(2C) choosing two most significant scores; and 

(2D) setting difference of them as the score deviation (SD). 

3. The method for web content filtering according to claim 1 , wherein said step 
(1A) further includes step (3A) building an interval threshold and initializing an 
interval value, said step (1E) further includes step (3B) computing an average 
interval of the object character in the information document as interval value, 
and said step (1G) further includes step (3C) determining whether the interval 
value is larger than the interval threshold. 

4. The method for web content filtering according to claim 1, wherein said 
keyword category and said relevance probability chart can be achieved by the 
following steps: 
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(4A) providing testing document category D={d1, d2,..., d|D|} » each testing 
document di to be formed by a word sequence V={w1 , w2,..., w|V|}, and each 
text category cj including at least one testing document di ; 

(4B) in the testing document, based on the text category, compute 
vocabulary probability P(wt|cj) of each word wt in the text category cj; 

(4C) in all text categories, based on the significance degree of vocabulary 
probability, choosing vocabulary set of a predetermined numbers as keywords; 
and 

(4D) building relevance probability chart using keywords and 
corresponding probabilities. 

5. The method for web content filtering according to claim 1, wherein said 
contact node is a gateway system of a local network. 

6. The method for web content filtering according to claim 1 , wherein said 
contact node is a gateway system of a personal computer. 
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